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Abstract 

The construction of decision-theoretic Bayesian designs for realistically-complex nonlinear models is 
computationally challenging, as it requires the optimization of analytically intractable expected utility 
functions over high-dimensional design spaces. We provide the most general solution to date for this 
problem through a novel approximate coordinate exchange algorithm. This methodology uses a Gaussian 
process emulator to approximate the expected utility as a function of a single design coordinate in a 
series of conditional optimization steps. It has flexibility to address problems for any choice of utility 
function and for a wide range of statistical models with different numbers of variables, numbers of runs 
and randomization restrictions. In contrast to existing approaches to Bayesian design, the method can 
find multi-variable designs in large numbers of runs without resorting to asymptotic approximations 
to the posterior distribution or expected utility. The methodology is demonstrated on a variety of 
challenging examples of practical importance, including design for pharmacokinetic models and design 
for mixed models with discrete data. For many of these models, Bayesian designs are not currently 
available. Comparisons are made to results from the literature, and to designs obtained from asymptotic 
approximations. 

Keywords: Computer experiments; Gaussian process emulator; high-dimensional design; Smoothing. 
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1 Introduction 


Bayesian design of experiments is a natural paradigm for many problems arising in the physical sciences and 
engineering, particularly those concerning the estimation of nonlinear models where design performance, 
as measured by classical optimality criteria, is dependent on the a priori unknown values of the model 
parameters. A decision-theoretic approach, reviewed by Chaloner and Verdinelli (19951, determines an 


optimal allocation of experimental resources via maximization of the expected utility 


U(S) =[[ 


u(S, 0, y)7r(y, 0 | 8) dyd0 . 


( 1 ) 


Here, the utility u(5,0,y) quantifies the experimenter’s gain from using design 8 £ V to obtain data y £ y 
assuming model parameter values 0 £ $, with the statistical model defined through the joint density function 
7r(y, 0 | 8 ) = 7r(y | 0, 8)tt(iJj). As an example, assume the ith response is modeled as y* = g{xi\ 0) + e* with 
the Xi defining values taken by a controllable variable, 0 being a vector of unknown parameters defining the 


mean response, and observation error £j ~ 7V(0, cr 2 ) (i = 1,..., n). Then 0 = (0 , a 2 ), 8 = (xi, 


Cn ) 1 


and likelihood 7r(y|0,<5) is a multivariate normal density function. The utility function u(5,0,y) will 
typically be a function of some posterior quantities of 0 (see Section [3T| ) . 

Selection of a fully-Bayesian optimal design 8 * = arg maxg eT) U(8) has traditionally been challenging 
for all but the most straightforward utility functions and models due to the high-dimensional and, typically, 
analytically intractable integrals in |I]). Some recent progress has been made using simulation-based method¬ 
ologies for low-dimensional problems, i.e. small numbers of controllable variables and/or small numbers of 
design points, see Ryan et al. ( |2016| ) and references therein. There are, however, no methods available 
for decision-theoretic Bayesian optimal, or near-optimal, multi-variable design for nonlinear models. The 
methodology in this paper fills this important gap, and is demonstrated on generic problems of practical im¬ 
portance including pharmacokinetic studies and experiments that produce discrete data. Previous attempts 
to obtain fully-Bayesian optimal designs for these types of experiment have been extremely limited. 


In a landmark paper for low-dimensional design problems, Muller and Parmigiani (1996) proposed selec¬ 


tion of a design by maximizing a surrogate function found by approximating U(8) for a small num ber, to, of 


designs using simulation, and then smoothing the resulting values, U(8i 


,U(8 m ). See also 


Jones et al. 


(2016) and Weaver et al. (2016). In essence, these approaches perform a computer experiment to construct 


a statistical emulator for the approximation U(8 ), a research area where there has been huge activity in 


recent years (see, for example, Dean et al. 2015 Section V). For an experiment with n runs and v variables, 
8 has nv elements. Therefore, application of this approach to design for multi-variable models suffers from 
a curse of dimensionality, requiring (i) the construction of emulators in very high dimensions; (ii) large, 
e.g. space-filling, designs composed of selections of points from an nx-dimensional space, leading to (iii) a 
prohibitive number of evaluations of U{8 ), particularly if U(8) is computationally expensive. 

Our approach overcomes these problems by building a series of one-dimensional emulators for the ap¬ 
proximated expected utility. We emulate U(8) = U(8i\ <Tq) as a function of only the ith “coordinate” (or 
element) Si conditional on 8^ = (Si, ..., ck_i, <5i+i, <5 n „) , the values of all coordinates excluding the *th 
(i = 1,... ,nv). When these emulators are combined with a continuous version of the coordinate exchange 
algorithm (Meyer and Nachtsheim, 19951, an effective and computationally efficient design selection method¬ 
ology results. Conditional, coordinate-wise, optimization is key to overcoming the curse of dimensionality 
described above. 

Until relatively recently, the usual approach to Bayesian design was to use a normal distribution as an 
asymptotic approximation to the posterior distribution of 0 (e.g. Chaloner and Larntz 1989). For standard 
utility functions (see Section 3.1.2), use of such a pseudo-Bayesian approach leads to the integrand in (|T|) 
no longer depending on the data y. The resulting integral, with respect to 0, typically has much lower 
dimension and can be approximated using efficient deterministic quadrature rules (Gotwalt et al. 2009). 
However, the appropriateness of such approximations for small experiments is open to question. 


For high-dimensional design, an alternative to the use of a normal approximation was suggested by Ryan 


et al. (2014). These authors combined the simulation-based approach of Muller (1999) (see also Muller 
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et al. 2004 and Amzal et al. 2006) with a dimension-reduction scheme to find designs for single-variable 


nonlinear models (v = 1) variables and a large number of runs. Designs were restricted to those formed from 
a sampling scheme defined via two parameters; for example, the initial design point and a spacing parameter. 
An optimal design in this subclass then consists of the best choices of these two parameters, a substantially 
easier optimisation problem to solve. 

In contrast to either applying an asymptotic approximation or restricting attention to a subset of the 
design space, both of which may result in the selection of inefficient designs with respect to the exact expected 
utility, we attempt to find optimal or efficient designs for the original problem across the whole design space 
via an approximate optimization scheme. These three different approaches are compared in Section [3j 

The remainder of the paper is organised as follows. In Section [2] we describe approximate coordinate 
exchange for finding decision-theoretic Bayesian designs, including the use of Monte Carlo integration and 
Gaussian process emulators to approximate the expected utility. The methods are applied to a range of 
challenging and practically relevant examples in Section [3] including models for which Bayesian design has 
previously been computationally infeasible. We summarise the advantages of our approach in Section [4] and 
highlight some ongoing work. 


2 Approximate Coordinate Exchange (ACE) 

We first establish some notation. Suppose that a design consists of n runs or points, each of which determines 
the settings of v controllable variables and results in a single observation of the response variable. Let D 
denote the n x v design matrix with fcth row cR. specifying the settings of the v factors in the fctli run 
(k = 1,..., n). Let q = nv, then the design may be represented as a q -vector 5 = vec (D) G V C M 9 , where 
vec(-) denotes vectorisation via stacking the columns of a matrix and T> is the q-dinrensional design space. 

The proposed algorithm for decision-theoretic Bayesian design has two phases. Phase I applies a novel 
coordinate exchange algorithm where, for each coordinate, maximisation of U (<5) is replaced by maximisation 
of a surrogate function U(8). Phase I tends to produce clusters of design points that are very similar in the 
values of the controllable variables. Such clustering is common in heuristic design search (see also Gotwalt 


et al. 


2009). Hence, in Phase II, we check if the points in each cluster can be consolidated into a replicated 

ch. 12). Replication of points is 


design point using a point exchange algorithm (Atkinson et al. 2007 


common in optimal design for parametric models and a key principle of design of experiments (Wu and 


Hamada 2009 ch. 1). In Phase II, the candidate set is the design found from Phase I. The two phases form 


an approximate coordinate and point exchange algorithm which, for brevity, we call the ACE algorithm. 

In Section [iO] we define the ACE algorithm. For Steps [la]-[Tc] of the algorithm, we assume the availability 
of (i) a Monte Carlo approximation of the expected utility, 


B 


U(Si | S (i) ) = 0(8) = «(*> y I, iP t )/B , 


i=i 


with {y;, a random sample from the joint distribution with density 7r(y, ip \ 8 ); (ii) coordinate-designs 

,... G T>i at which we evaluate U(6i | <5(p), where T)j C R is the domain for the ith coordi¬ 
nate; and (iii) a suitable one-dimensional emulator, U(5i | <5(p), for U(8i \ 8 p)). Further details are given in 


Section |2l2| with examples in Section 2.4 


ACE is designed to solve a stochastic optimization problem, as only approximations to the expected 
utility are available formed as linear combinations of realisations of the random variable u(8, y, ip). As such, 
proposed changes to the design in Steps [lcT] and [3e] of the algorithm are accepted with probability derived 
from a Bayesian test of the difference in the means of Monte Carlo approximations to the expected utility 


for the current and proposed designs. Further details are given in Section 2.3 


2.1 The ACE algorithm 


0. Choose an initial design 5 U and set the current design 8 C = 8". 
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Phase I: coordinate exchange 


1. For i = 1 ,,q, 

(a) Select an m point coordinate-design & = {6},.. .,6™} G XV 

(b) Evaluate U(Sj \ Sfa),U(S? \ 8^). 

(c) Construct U(5i \ 6 by fitting a statistical model to , U(S {) j 

(d) With probability set Sf = SJ = argmax^.g-p. t/(^ | where 

BU(S cf ) -BU(S C )\ 


p\ = 1 — T 2 B-2 I — - 


^2Bvi 


( 2 ) 


Tib -2 is the probability distribution function for a t distribution with 2_B — 2 degrees of freedom, 
«5 c t = (£f,..., • ■ ■, and 


1 2 


Ef =1 «(<S Ct ,yUI)-^ Ct ) +£f=i u(5 c ,yf,tf)-U(S c ) 


1 2 


= 


2B-2 


for {yj, ip \}^ =1 and {yp, ipf }^ =1 independent random samples from 7r(y, ip | and 7r(y, ip | <5 C )> 
respectively. 


2. Repeat step [T] iV/times. 


Phase II (point exchange) 


3. (a) For k = 1,..., n, let 8^ = vec(Dp^), where 


dP } = 


(D C ) T (d?) 1 


and dp is the fcth row of D c , the design matrix for . 

(b) Let k) = argmax fc U(8^) and set D^ 2 ) = 

(c) For h = 1,..., n + 1, let 8^ = vec(D^), where 


D™ = 


(d< 2 >) T ...(d™ 1 ) T (d< 2 L) T ... (d™,)' 


l T 


and dP' ) is the hth row of D^ 2 P 


(d) Let /P = argmax ft U(8^) 


(e) With probability p\ D set 6° = <5^, where 


t ( BU(8^)-BU(5 C ) 

p n -l-T 2B . 2 \ 7 _- 


( 3 ) 


and 


£f=i Us$,y?\ri 3) ) - P(4 3 t } )l 2 + Ef=i k« c ,yp,^P) - U(8 C ) l 2 


I 'II = 


with {yp\ ip\ V) }f =l a random sample from 7r(y, ip \ S^J). 


2B-2 

(3b 
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4. Repeat step [3] Njj times. 

5. Return S* = S c . 


The decision on when to terminate a run of the algorithm, i.e. choice of Nj and Njj , is complicated by 
the stochastic nature of the approximation to the expected utility. For the examples in Section [3j Nj = 20 
and Nii = 100 are sufficient to achieve approximate convergenc e. H ere, convergence is assessed graphically 
from trace plots of U(S C ) against iteration number; see Section 


3.2 


for examples of such plots. 


To avoid local optima, the algorithm is run M times (in embarrassingly parallel fashion) with each 
run starting from a different, randomly-chosen, initial design S Q (a random Latin hypercube design, unless 
otherwise stated). The selected design, d*, is the design having the highest average approximate expected 
utility, averaged across C sets of Monte Carlo simulations. In this paper, M = C = 20 was used, unless 
otherwise stated. 


2.2 Emulation via computer experiments (steps [la| - |lc[ ) 

In Phase I of the algorithm, a sequence of one-dimensional emulators is constructed for U(Si \ 8^), i = 1,..., q 
(Step lc). A variety of smoothing or interpolation techniques could be applied to construct each emulator. 
Muller and Parmigiani (1996) used local polynomial regression to emulate low-dimensional design utilities. 


We adopt a Gaussian Process (GP) regression model (see, for example, Rasmussen and Williams. 2006), 
which is widely used for computer experiments, and use the posterior predictive mean as an emulator. Let 


fii = ^U{81 | <5(q)/m, 

i=i 

m 2 

= ««)-£*) /(ra-i), 

j=i 

and z(5i) = (u(5i\Sf^) — faij I(J% for any 5 r £ 2?,. The GP model assumes that any vector z(() = 

[^(J 1 ),... ,z(<5 m °)] T , for ( = {A 1 ,..., <5 m °} £ V ” l ° and integer mo, has joint distribution N (0 m „, A(£)), 
with O mo the mo zero-vector and A(£) an mo x mo covariance matrix. Hence, the posterior predictive mean 
of U(8 | 5(q) at an arbitrary <5 £ can be derived using standard results on the conditional distribution of 
normal random variables and used as an emulator: 


=A i + a i E[^)|zfe)] 

= jli + <7ja(5, £i) 1 A(£j) -1 z(£j). 


Under the common assumption of a squared exponential correlation function, A(£j) and a (5, £,;) have entries 

M&)st = exp {-p(6? - Si) 2 } +r]I(r = s), 
a(<5,6) s = exp{-p(^ s - <5) 2 } , 


for s,t = l,...,m, where 1(E) is the indicator function for the event E, and p,r] > 0 are unknown pa¬ 
rameters. The inclusion of nugget ry ensures the emulator will smooth, rather than interpolate, the Monte 
Carlo approximations of the expected utility. To limit computational complexity, at each iteration we find 
maximum likelihood estimates of p and ?y via Fisher scoring (see, for example, Pawitan 2001 pp. 174-177). 


In contrast, a fully-Bayesian approach would require application of a Markov chain Monte Carlo algorithm 
to construct each emulator, substantially increasing the computational cost of the algorithm. 

At each iteration of Step la a coordinate-design = (<5^,..., 5™) must be chosen at which to evaluate 
U(6i | 5(j)). We use a space-filling design, specifically a randomly-selected one-dimensional Latin lrypercube 
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design (see, for example, Santner et al. 2003 ch. 5), constructed by dividing Vi into m equally-sized sub¬ 
intervals, and then generating a point at random from each interval. We set m = 20, unless otherwise stated. 


This choice of m is conservative relative to the rule of thumb (Loeppky et al. 2009) of setting m equal to 10 


times the number of input dimensions (suggesting to = 10 in our case). We have, however, found it works 
well in practice for a variety of different types of examples, giving accurate emulators and not being overly 
computationally demanding. 

2.3 Adjusting a design coordinate (step [Id] ) or point (step [3e]) 

To make a change to the ith coordinate in Step |ld[ we first find Sj, the value of the coordinate that 
maximizes the emulator. We find the maximum by evaluating U (s | for 10,000 uniformly generated 
points in Vi. This discretization of the problem has proved both more reliable than continuous optimization 
and sufficiently computationally efficient. 

Choice of Sj is subject to both Monte Carlo error, from the evaluation of U(5i \ S (q), and emulator error 
from the estimation of U(6i | <5(q) resulting, for example, from an inappropriate choice of correlation function 


or errors in estimating p and p. It is clearly impossible to use usual residual diagnostics (Bastos and O’Hagan 


2009) to check emulator adequacy at each iteration of the algorithm. Instead, emulator error is eliminated 


from the decision to adjust a design coordinate by performing additional Monte Carlo integration to calculate 




the probability p\ in §■ This quantity is the posterior probability that E it(- if), y, <$ c ^) > E u(ip, y, 6 

under non-informative prior distributions and using Monte Carlo samples |?/>f,ypj and |i/;(,yj| , 

assuming both u(ip\ y^ ,<5^) and u(ip C , y c , S c ) are normally distributed with equal variances. See also 
Wang and Zhang (2006) for use of a classical hypothesis test in a simulated annealing algorithm. If this 
normality assumption were severely violated, a more sophisticated test procedure could be adopted at greater 
computational cost. 

A similar test is performed at Step 3e in Phase 2 of the algorithm to calculate p\j in p|) . We demonstrate 
the effect of Step [id] in Section |2.4| 


2.4 Illustrative example 


In this section, we illustrate the ACE methodology, in particular the combination of Steps lc and Id 


selecting and accepting a proposed change to the design. To enable assessment of the algorithm, we consider 
the analytically tractable problem of finding a one-point optimal design for the single-variable Poisson model 
y\f3 ~ Poisson(e^ a: ). There is a single design coordinate, 6 = x £ [—1,1], and hence our notation is simplified 
by replacing 6 by x in this example. A priori, we assume (3 ~ N(0.5,1) and adopt the utility function that 


leads to pseudo-Bayesian H-optimality (Section 3.1.2), given by 


u(/3,y,x) = logI(/3; x) 

= 2 log |x| + fix , 

where I(/3; x ) denotes the Fisher information. The expected utility is U{x) = 2 log |a;| +0.5a; and the optimal 
design is x* = 1. 

To simulate one iteration of Phase I of the ACE algorithm, we generate a coordinate-design ^ = 
{x 1 ,..., x m } as a Latin hypercube and, for each x- 7 , evaluate 

j b 

U{x 3 ) = 2log |cc J | + i 


i—1 


where 1/^}^ for B = 2 is a sample from a N (0.5,1) distribution. Figure 3 a ) shows U{x) plotted against 
x with the points {x^, U(x^)}JL X and the GP emulator U(x) superimposed (Steps la lb and lc). Clearly 
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Figure 1: Poisson example in Section 2.4 (a), (c) expected utility U(x) against x, with Monte Carlo 

evaluations U(x) at the coordinate-design points and GP emulator U(x)\ (b), (d) median probability p\ of 
accepting the candidate point against the current point, x c . [Coordinate-designs are: for (a), (b); £ 2 for 

(c), (d)]. 
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U(x) is maximised at a/ = 1 , and hence this candidate point should be compared to the current point x c 
(Step[id|. Figure [ljb) shows the median posterior probability, p\, of accepting this candidate point against 
x c , calculated from repeated calculation of © for multiple Monte Carlo samples. This probability is very 
close to one for nearly all values of x c except for x c « x\ where the probability reduces to 1/2. 

For a second coordinate-design, £ 2 (a different Latin hypercube), the results in Figures [ljc) andjljd) are 
obtained. Here, the GP emulator could be viewed as inadequate with the estimate of rj being too small, 
resulting in near interpolation of the U(x ■ 5 ). From Figurejljc), U(x) is maximised at x t = —1 and hence this 
becomes the candidate point. The median posterior acceptance probability, Figure [ljd), is now only close 
to one if U(x c ) is low, i.e. \x c \ < 0.5. Crucially, will be rejected with high probability if x c is close to 
the optimal design; at x c = 1 , the probability drops to zero. 

3 Substantive examples 

The ACE algorithm is now used to find decision-theoretic Bayesian designs for three important cases: a 
compartmental model, (hierarchical) logistic regression, and dose-response under model-averaging. The 
designs are found for commonly used utility functions and, where possible, compared to existing results. 

3.1 Utility functions 

In this section, we assess and compare designs found using variants on two utility functions, Shannon 
information gain (SIG) and negative squared error loss (NSEL). In practice, the form of the chosen utility 
function should be driven by the aims of the experiment and may often incorporate a cost function. We 
assume throughout that the model parameters can be expressed as ip T = ^ 0 T , 7 T ^, with 9 a p-vector of 

parameters of interest and 7 a (P — p)-vector of nuisance parameters. 

The SIG utility for 9 is given by 

u s {9,y,S) =log 7 r( 0 |y,d) -log 7 r( 0 ) 

= log 7 r(y| 0 ,<S)-log 7 r(y|d), (4) 

where © follows from an application of Bayes theorem and is often more useful for computation. A 
SIG-optimal design maximizes U s (8) = E^, !y [rt s (0,y, <$)]. This is equivalent to maximizing the expected 
Kullback-Liebler distance between the marginal posterior and prior distributions of 9, and is also equivalent 
to minimizing the expected entropy of the posterior distribution for 9. 

The NSEL utility for 9 is given by 


v 

u V (9,y,d) =-^2[e w -F,(9 w \y ,<5)] 2 . (5) 

W—l 

A NSEL-optimal design maximizes the expected utility U V (S), which is equivalent to minimizing the ex¬ 
pectation of the trace of the posterior covariance matrix of 9 with respect to the marginal distribution of 

y- 

3.1.1 Evaluating the expected utility via numerical approximation 

For many statistical models, including most nonlinear models, evaluation of u s (9,y,S) and u v (9, y, 6) 
requires numerical approximation. For given values of y and 9. the components of Q can be approximated 


where < Obi^b f is a size B random sample from the prior distribution of = ( 6 , 7 ). These quantities 
l ) b—1 

can be incorporated into a nested, or double-loop, Monte Carlo approximation of U s (8): 


U s (8) = — [log 7 r(y ; | di,6) - log 7 f(y 1 | 5)] , 


1=1 


with {y 9{\ l=1 a sample from the joint distribution of the response and parameters. Intuitively, the “inner 
sample” of size B is used to approximate the two marginal likelihoods in Q, the first marginal to 7 and the 
second to both 7 and 6 , and the “outer sample” of size B is then used to approximate the expected utility 
with respect to the joint distribution of y and 0. This approximation is biased for U s (8) due to the bias 
in log 7 f (y 10, «5) jmd lo g 7 r(y | 5 ). However, under regularity conditio ns sat isfied by most models of practical 


2000 


pp. 80-81), this bias is of order B 1 (Ryan 2003) and hence asymptotically 


importance (Severini 
negligible. 

For NSEL, E (9 W | y, 8) in ([ 5 ]) can be approximated via importance sampling: 


E(6» w |y, 8 ) = 


Ef=i^ 7r (y|^ ; 7b^) 

Ef=i 7r (y \8b,7bi8) 


1 B 


where j$ 

Hence, the following nested Monte Carlo approximation of the expected utility is obtained: 


b, 7 f, t is a random sample from the prior distribution of ip, and 9b w is the udh element of 6b . 
) 6=1 


u 


1 


w = h 

1 =1 w—l 


-E(9 w \yi,8) 


where 9i w is the wth element of 9i. Here, the inner sample is used to approximate the posterior expectation, 
and the outer sample used to approximate the expected utility. Importance sampling has commonly been 


used to estimate posterior quantities for Bayesian design (see Ryan et al. 2016 and references therein), 
although the approximation of the expected utility will again be biased due to bias in E {6 W \ y, 8) 2 . 


In the examples, we set B = B = 1000 for the evaluation of U(8i \ 8 { (.^) in Step lb of the ACE algorithm 


(chosen from practical experience). For the comparisons in Steps Id and 3e we set B = B = 20, 000. 


3.1.2 Evaluating the expected utility via normal approximation 

The following approximations to U s (8) and U v (8) are commonly used 
justified via a normal approximation to the posterior distribution of ip: 

(j) s (8) = E^ (log |J(0;5,7)1) = [ log\l(9;8,'Y)\Tr(ip)d'tp, 

Je 

<j) v (8) = —E^, [tr{X( 0 ; 5 , 7 ) -1 }] = - [ tr {1(0; 5, 7 ) -1 } Tr^jdi/*, 

Je 

with 1(6,8, 7 ) the Fisher information matrix for 6, or an approximation thereof. Designs that maximise 
cj) S and 4> v are sometimes referred to as pseudo-Bayesian D- and A-optimal designs, respectively. Note that 
these expressions also result from taking expectations of the utility functions 

u D (ip,y,8) = log \1(6; 5 , 7 )| , u A (ip,y,S) = -tr{I( 0 ; 5 , 7)} _1 , 

which do not depend on y. Unbiased Monte Carlo approximations to </> s (5) and <j> 1 (5) can be obtained via 
sampling from the prior distribution for xp: 

4>‘ S (<*) = ^X! l0 Sl :Z: ( 6 'b < *,7z)| I W • 

z =1 1=1 


Atkinson et al. 


2007 


ch. 18), 
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For comparison of designs, the D-efhciency of design <$i relative to design S 2 is defined as 


Eff D (<5i,<5 2 ) = 100 x exp 


| $ s (Si) - <f> s ( 8 2 ) /p} 


(6) 


3.2 Compartmental model 

Compartmental models are applied in pharmacokinetics to study how materials flow through an organism, 



is a nuisance parameter, a(-) and &(•;•) are application-dependent functions, and n( 0 ;ti) = exp (— OiU) — 
exp {— 0 2 ti). _ 


For this problem, Ryan et al. (2014) assumed that 


«(0) = 


40009 


0 3 (02-0i) ’ 


b{d; ti) — ( 1 + 


a{0) 2 n(6;ti) 


10 


cr 2 =0.1, 


and found designs using the SIG utility function. Independent log-normal prior distributions were assumed for 
the elements of 9 with, on the log scale, each having common variance 0.05 and expectations log(O.l), log(l) 
and log(20) for 6 1 , 0 2 and 63 , respectively. These authors also incorporated the constraint max Sit= i j ... i „ |t s — 
tt | > 0.25, i.e. that sampling times must be at least 15 minutes apa rt. It is straightforward to incorporate 


Id 


U(Si | is maximized over a set 


this constraint into design search using the ACE algorithm. In Step 
T>i that satisfies the constraint. Phase II of the ACE algorithm is then omitted as replicated sampling times 
are not permitted. 


Ryan et al. (2014) restricted their search for a SIG-optimal design to the class of designs defined via 


a dimension reduction scheme (DRS) that set the n sampling times to scaled percentiles of a Beta(ai,a 2 ) 
distribution. Hence, the design problem was reduced to selecting two parameters, a\ and a 2 . The |Mtiller| 
(1999) simulation algorithm was used to sample from an artificial posterior distribution for a±,a 2 , with 
unnormalized density equal to the integrand in ([I]). The chosen design was then the scaled quantiles from 
the Beta distribution obtained from using the posterior modal values of a\ and a 2 . 

We compare this design with three designs found from ACE: (i) a SIG-optimal design; (ii) a pseudo- 
Bayesian D-optimal design ; and (iii) an optimal choice of ai,a 2 for the Beta DRS. For this latter design, 
the sampling times are given by tj = 24 x Q ai,ci! 2 ^, with Q(r; ai,a 2 ) the rth quantile of the 

Beta(ai,a 2 ) distribution. In Step Id of the ACE algorithm, the sets and V 2 are given by 


Z>i 

V 2 


X G 


X G 


P+ • 


mm 

3=l,...,n-l 


mm 


n + 1 
3 

n +1 


,x,a 2 

,ai,x 


Q 

Q 


3 + 1 

n + 1 

i + 1 

n +1 


, x, a 2 


x 


0.25] 

> ”24”J ’ 
0.25] 

> ”24”J ' 


For the SIG- and D-optimal designs, Figures [2^a) and[2])b) show trace plots of the approximate expected 
utility at each iteration of the algorithm. Approximate convergence is demonstrated, for both utility func¬ 
tions, through each run of the algorithm resulting in a similar value of U{ 8 ) after a relatively small number 
of iterations. Convergence is, however, achieved more quickly for pseudo-Bayesian D-optimality, which does 
not require approximation of posterior quantities. This criterion also displays greater consistency in the final 
approximated expected utility between runs of the algorithm. 

The sampling times for the four designs, shown in Figure [2]jc), indicate that the designs using dimension- 
reduction do not display the clustering of points evident in the SIG and pseudo-Bayesian D-optimal designs. 
The boxplots in Figure [2^d) , from 20 evaluations of U s (S*) (B = 20,000) for each design, confirm that 
larger approximate expected utilities are obtained, up to a 5% improvement, when DRS is not used. Here, 
the pesudo-Bayesian D-optimal design provides a good approximation to the SIG-optimal design. 
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(a) (b) (c) 




(DRS) (DRS) 



SIG - Shannon information gain 
Bayes D-opt - Bayesian D-optimal 
ACE - approximate coordinate exchange 
DRS - dimension reduction scheme 



b ACE ± Ryan et al 


Figure 2: (a), (b) Trace plots of (7(<5 C ) for each iteration of the ACE algorithm for SIG and pseudo-Bayesian 
D-optimality utilities, respectively; in each plot, the black line shows the trace of the expected utility for 
the best design; (c) Designs found from the ACE algorithm: unrestricted SIG-optimal, pseudo-Bayesian 
-D-optimal, Beta DRS SIG-optimal, together with the Ryan et al. (20141 Beta DRS SIG-optimal designs; (d) 
Boxplots for 20 evaluations of U s (S*) for designs from these four methodologies; (e) Approximate expected 
utility surface for SIG as a function of the Beta DRS parameters; parameter values corresponding to the 


Ryan et al. (2014) and the ACE DRS designs are marked. 
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The DRS design found from ACE outperforms the Ryan et al. (2014) design. To explore this result 


further, the expected utility surface was investigated as a function of ay and a 2 by sampling 40,000 ( 01 , 02 ) 
pairs from [0, 5 ] 2 and evaluating U s (S) for each pair. The resulting expected utility surface is shown in 
Figure |2](e), where U s (5) = 0 for parameter pairs that do not satisfy the 15 minute constraint. Both 
methods identify the relatively small region of high expected utility; the sampling-based algorithm (Ryan 
et al. 2014| Miiller[ |~i~999) fails to identify the optimum point within this region. 


3.3 Logistic regression in four factors 

Fully-Bayesian design for multi-variable logistic regression has not appeared in the literature, although 


Hamada et al. (2001) found a SIG-optimal design for a single-variable model and Woods et al. (2006) were 
the first to find multi-variable pseudo-Bayesian H-optimal designs. Here, we find designs for a first-order 
logistic regression model in four variables where the response is measured for G groups of n g runs, i.e., 
n = Grig. Let y s t ~ Bernoulli(p st ) be the tth response from the sth group (s = 1,..., G\ t = 1,..., n g ), with 


Po + UJ S 0 + (/?1 + W s i)a;i s t + (P‘2 + <jj s 2)x2st + (/?3 + Us3)X3st 

+ (Pa + St 
x !t (/3 + u s ) , 

where (3 £ R 5 are the parameters of interest, and u> s € R 5 (i = s,... ,G) are the group-specific nuisance 
parameters (or “random effects”). Let X = (X^ • • ■ Xg) T be the n x 5 model matrix where X s is the n g x5 
matrix with tth row given by xj t . The design matrix D is formed as the last four columns of X, S = vec(D) 
has length q = 4 n, and T>i = [— 1 , 1 ] for i = 1 ,..., q. 

The following independent prior distributions for each element of /3 are assumed: 

Po ~ U[—3,3], Pi ~ U[4,10], P 2 ~ U[5,11], 

P 3 ~ U[— 6 , 0 ], P 4 ~ U[—2.5,3.5]. V) 

We find designs for two different prior distributions for each u> s (s = 1,..., G): (i) a prior point mass at 
ui s = 0 for all s, resulting in standard logistic regression with homogeneous groups; (ii) a hierarchical prior 
distribution in which the elements of u s are independent and identically distributed as uj sr ~ U[—A r , A r ], for 
r = 0 ,..., 4, with A r > 0 unknown and having triangular prior density 7 r(A r ) = 2 with (Lq, ■ ■ ■, L 4 ) = 

(3,3,3,1,1). 



3.3.1 Logistic regression with homogeneous groups 


We use ACE to find designs that maximize the SIG and NSEL expected utilities for homogeneous logistic 
regression with u) s = 0 and n = 6 ,..., 48. For comparison, we also find pseudo-Bayesian D- and A-optimal 


designs. We also compare to maximin Latin hypercube (LH) designs (Morris and Mitchell 1995). For this 
example, the starting designs for the algorithm were a locally D-optimal design (for SIG and Bayesian D) 
and a locally A-optimal design (for NSEL and Bayesian A ), found from ACE via maximization of ip s (S) or 
ili v (S), respectively, using a point prior distribution for each parameter with support at the mean of each 
prior distribution in (J7|. Figure[3]presents results (minimum, mean, maximum) of 20 evaluations of (a) U s (S) 
for the SIG-optimal, Bayesian D-optimal and maximin LH designs, and (b) —f/ v (S) for the NSEL-optimal, 
Bayesian A-optimal and maximin LH designs, using B = 20,000 Monte Carlo samples. For small n, on 
average there are substantial differences in expected utility between the fully-Bayesian and pseudo-Bayesian 
designs, with the SIG-optimal design having expected Shannon information gain up to 20% larger than the 
Bayesian JD-optimal design and the NSEL-optimal design having expected trace of the posterior covariance 
matrix up to 27% smaller than the Bayesian A-optimal design. For both SIG and NSEL, as n increases, the 
difference in expected utility between these designs and the pseudo-Bayesian designs decreases. For SIG, 
these findings agree with asymptotic results on the convergence, under certain regularity conditions, of the 
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posterior distribution to a normal distribution (see, for example, Gelman et al. 2014 pp. 585-588). The 
maximin LH designs, which are model-free space-filling designs, perform poorly under both SIG and NSEL 
utilities and are not competitive with the model-based designs. 

As there are no comparable results on fully-Bayesian design for multi-variable logistic regression in the 
literature, we compare the pseudo-Bayesian D-optimal designs for n = 16 and n = 48 found from ACE with 


designs from the approach of Gotwalt et al. (2009). We independently implemented the methodology of 


these authors to obtain designs for n = 16 and n = 48; we also compare to the n = 16 run design published 
by Gotwalt et al. (2009). For each of these three designs, we calculated the average of D-efficiency ([6]) 
over 20 Monte Carlo approximations (each with B = 20, 000) relative to the appropriately-sized design 
from ACE. The published 16-run design has average efficiency of 82%; the designs from our implementation 
perform similarly to the ACE designs, with average efficiencies of 99.9% and 101.3% for n = 16 and n = 48, 
respectively. 


3.3.2 Hierarchical Logistic Regression 

For hierarchical logistic regression, we again find SIG-optimal and NSEL-optimal designs, along with pseudo- 
Bayesian D- and A-optimal designs using an approximation to the Fisher information (Pawitan, 2001 p. 
467). We set n g = 6 and G = 2,..., 8, leading to n = 12,14,... ,48. To reduce the computational burden, 


B = 1000 was used in Step 3e to find SIG-optimal designs. Previous research has found pseudo-Bayesian 


.D-optimal designs for smaller numbers of variables and group sizes (Waite and Woods 2015). 

Figures Glc) and (d) show results from 20 evaluations of U s (i 5) and — U v (6) for the SIG-optimal and 


pseudo-Bayesian .D-optimal designs, and the NSEL-optimal and pesudo-Bayesian A-optimal designs, respec¬ 
tively. Again, the performances of maximin LH designs are included for reference (see figure caption for 
details). A comparison with Figures [3ja) and (b), respectively, shows lower expected gains in Shannon 
information and higher expected posterior variance for the hierarchical logistic regression model due to ad¬ 
ditional uncertainty introduced by the group-specific parameters. As with designs for homogeneous logistic 
regression, the difference in expected utility between the pseudo-Bayesian designs and the fully-Bayesian 
designs decreases as n increases, and the LH designs perform poorly. 


3.4 Binomial regression under model uncertainty 

Uncertainty over the choice of statistical model 7r(y, ?/> | S) is common in practice, and has been addressed 


in pseudo-Bayesian design for generalized linear models by Woods et al. (2006). To demonstrate Bayesian 


optimal design under model uncertainty, we find follow-up designs for the beetle mortality study of |Bliss| 
(1935), a common example used to illustrate binomial regression. In the original data set, 481 beetles were 


each administered one of eight different doses (in mg/L) of carbon disulphate. We broadly follow the case 
study analysis of O’Hagan and Forster (2004 pp. 423-433), who reproduced the data, and assume interest 
lies in providing a mo del-averaged posterior distribution of the lethal dose 50 (LD50), the dose required to 
achieve 50% mortality. 

We assume that the binary indicator of death for each beetle is an independent Bernoulli random variable. 
The number, y^, of deaths from dose x*, is modelled as yk ~ Binomial(rifc, Pk), where pk is the probability 
of death for the fcth dose which was administered to rik beetles, i n fc = 481. We denote the link 
function by g(pk) = Vk, with r/k the linear predictor and consider six models formed by the Cartesian 
product of three link functions and two linear predictors: the logit, g{pk) = log{p/-/(l — Pk)}, the c-log-log, 
g{Pk) = log{-log(l — Pk)}, and the probit, g(pk) = 4>~ 1 { j Ofc}, with ${•} the standard normal distribution 
function; and 1st order (rji = /3 0 + Pi%i) and 2nd order (?yj = P 0 + Pi x^ + P 2 X 1 ) linear predictors. 

Let u £U = {1,..., 6} denote the model indicator (see Table [l]) and let (3 U denote the vector of regression 
parameters under model u. LD50 is then given by 


LD(Pu) = 


W-P 0 

th _ 

~Pl + \/ Pl-^p2(Po-w) 
2/3 2 


for u = 1, 3, 5 (1st order linear predictor) 
otherwise, 
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Figure 3: Results from 20 evaluations of (a) U s {8) for SIG-optimal, pseudo-Bayesian D-optimal and maximin 
Latin hypercube designs, and (b) —U v (5) for NSEL-optimal, pseudo-Bayesian A-optimal and maxmin Latin 
hypercube designs, for homogenous logistic regression; (c) and (d) show the same evaluations for hierarchical 
logistic regression. For the latter two plots, for each value of n, 20 different random assignments are made 
of the points of the Latin hypercube design to the G groups, and each resulting design is evaluated 20 
times. For each design, the central plotting symbol denotes the mean expected Shannon information gain 
or expected average posterior variance, with the two horizontal lines denoting the minimum and maximum 
of these quantities. 
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where w = log {—log (0.5)} for the c-log-log link function, and 0 otherwise. We use unit information prior 
distributions (Ntzoufras et al.[ 2003) for f3 u \ u under each model and set tt(u ) = 1/6 for u = 1,... ,6. The 


posterior model probabilities for each model are approximated using importance sampling to evaluate the 
marginal likelihood of each model, and given in Table [I] Samples from the posterior distribution of the 
model parameters are generated for each of the six models using the Metropolis-Hastings algorithm, and 
then weighted by 7 r(u|y) to produce a sample from the joint posterior distribution (3 ul u\y of regression 
parameters and model indicator. A sample from the model-averaged posterior distribution of LD50 can be 
obtained by evaluating LD{(3 U ) for each sampled parameter vector. 


Table 1: Approximate posterior model probabilities, ^ r(u | y), for the beetle mortality data. 


u 

Link Function 

Linear Predictor 

tt(m | y) 

1 

Logit 

1st order 

0.0216 

2 

Logit 

2nd order 

0.0686 

3 

C-log-log 

1st order 

0.7580 

4 

C-log-log 

2nd order 

0.0612 

5 

Probit 

1st order 

0.0304 

6 

Probit 

2nd order 

0.0602 


We consider the design of a follow-up experiment using a further no (potentially new) doses. Each dose is 
to be administered to nok 0 beetles (ko = 1 ,... , no) and, in each group, the number, yofe 0 > of beetles that die 
is recorded. Let yo be the no x 1 vector of the numbers of beetles that die in the follow-up experiment. We 
assume that nok 0 is unknown and adopt a Poisson(A) prior distribution. Hence yok 0 ~ Poisson(A/9*, 0 ). We 
choose A = 60, consistent with the values of nk in the original dataset, and find designs for n 0 = 1,..., 10 
to estimate the value of LD50 under the NSEL utility function by maximizing 

U v (6) = -^7 r(u|y) [ [ [LD(/3 U ) - E {LD((3 U ) | y 0 , y, <5)] 2 tt(/3 Ui y 0 | u, y)d/3„dy 0 , 
u =i J y Jb - 

where design S is the no x 1 vector of doses and B u is the parameter space for model u. For the purposes of 
design and modelling, we assume that 5i € T>i = [—1,1] for all* = 1 ,..., no but transform the doses to the 
original scale [1.6907,1.8839] for the presentation of results. 

We can approximate U v (6) by 


U v (5) = E [ LD ^m) - E (LD((3 u ) | y 0 py, S) 
n 1=1 


where {/3 u i, ui, yo/}^Li is a random sample from the joint distribution with density 7r(/3 u , yo | y), and 


E (LD(f3 m ) | y 0 , y, d) 


Ef=i LD (PubM yo I flub' m b ) 

Ef=i ’’’(yo I rhb) 


where hi b j is a random sample generated from the joint distribution with density 7 t(/ 3 u , u \ y). 

Figure [4] summarises the results from the ACE algorithm. The doses in the NSEL-optimal design lie in 
the lower tail of the (original) posterior distribution of LD50 for all values of no, see Figure |4|a). For no > 1, 
the doses are concentrated near a single point (1.77), for example four replicate points occur for n 0 = 10. 
The approximate expected posterior variance of LD50, — [/ v (S), rapidly decreases as no is initially increased 
from 1, see Figure |4](b); the rate of decrease slows as no becomes larger. 
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(a) 
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Original" 
n 0 = 1 
n 0 = 2 
n 0 = 3 
n 0 = 4 
n 0 = 5 
n 0 = 6 
n 0 = 7 
n 0 = 8 
n 0 = 9 
n 0 = 10 


1.70 1.75 1.80 1.85 1.90 

Dose 




(d) 



— i-1-1-1 

1.70 1.75 1.80 1.85 

Dose 1 


1.6e^05 

Figure 4: (a) Posterior density for LD50, the original experimental doses and optimal doses (in mg/L) for 
each value of no; (b) boxplots of 20 evaluations of — U v (8 *) for each n 0 for the NSEL-optimal designs; (c) 
negative approximate expected utility —U v (S) against dose for no = 1; the vertical line indicates S*. (d) 
negative approximate expected utility —[7 V (. 5 ) against dose for n 0 = 2; IE indicates S*. 



1.8e-05 2.0e-05 2.2e-05 

Expected posterior variance 
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To further investigate the selected designs, the expected utility surface and the performance of the ACE 
algorithm, we randomly generated 10,000 designs for no = 1 and n o = 2 uniformly from T>\ and r D\ 1 
respectively. For each design, we evaluate —U V (S) and plot against dose; see Figure |4fc) for no = 1 and 
Figure |4jd) for n 0 = 2. The NSEL design identified by ACE is marked in each plot and, for both values of 
no, the minimum negative expected utility is achieved. The variance of the original model-averaged posterior 
distribution for LD50 is 2.10 x 10 -5 . Hence for both no = 1 and no = 2, it is clear that choosing a design 
composed of only very high or low doses would have resulted in a negligible expected reduction in variance. 


4 Discussion and future work 


The ACE methodology proposed in this paper provides a step-change in the nature and complexity of 
statistical models and experiments for which Bayesian designs can be obtained. It may be used to find 
decision-theoretic designs whenever it is possible to simulate values from the prior distribution of the model 
parameters and responses from the statistical model. The combination of emulating an approximation to the 
expected utility and the coordinate exchange algorithm has allowed much larger problems to be tackled than 
was previously possible, both greater numbers of runs and more controllable variables. The algorithm also 
matches or exceeds the performance of existing approaches for smaller problems, and offers a clear advantage 
for design selection over the application of a dimension reduction scheme. The new designs made possible by 
this methodology also allow previously impossible benchmarking of designs from asymptotic approximations. 

As presented, ACE can be applied to numerous important practical problems using the available R 
package. We have applied, or are in the process of applying, ACE to problems from chemical development 
and biological science. There are also a variety of extensions that could be made to ACE to increase its 
computational efficiency and applicability. We now highlight a few of these areas. 

In ongoing work, we are extending and applying the methodology to find designs for statistical models 
where the likelihood function is only available numerically as the output from an expensive computer code 


(see also Huan and Marzouk 2013). Such models include those described by the solution to a system of 
non-linear differential equations, which are increasingly studied in the field of uncertainty quantification (e.g. 


Chkrebtii et al. 20151. 


Convergence of the algorithm may be improved through a reparameterization of the design to remove 
dependencies between coordinates (e.g. Fletcher 1987 p. 19) that can be evident in efficient designs for 
some models. Such dependencies could be identified through pilot runs of the algorithm or by studying 
properties of pseudo-Bayesian designs. Additionally, the computational burden of the algorithm could be 
further reduced by employing alternative approaches to perform each one-dimensional optimization step in 
the algorithm. For example, a sequential strategy could use an expected improvement criterion modified for 


stochastic responses (e.g. Pichney et al. 2013). 


Alternative strategies could also be adopted for the approximation of the expected utility. Zero-variance 

129-132, 


Monte Carlo (Ripley 1987 


pp. 


Mira et al. 2013) could be used to reduce the variance of the 


Monte Carlo estimator through the introduction of negative correlations via antithetic variables. Combining 
deterministic approximations, such as expectation propagation, with Monte Carlo methods would remove the 
need for nested simulation and may work well for nonlinear regression models with normal prior distributions. 


Acknowledgements 

This work was supported by the U.K. Engineering and Physical Sciences Research Council through Fellowship 
EP/J018317/1 for D.C. Woods. The authors thank the participants at the “Bayesian Optimal Design of 
Experiments” workshop (Brisbane, Australia, December 2015; http://www.bode2015.wordpress.com) for 
useful discussions on extensions and future work. The authors acknowledge the use of the IRIDIS High 
Performance Computing Facility, and associated support services at the University of Southampton, in the 
completion of this work. 


17 




















References 


Amzal, B., Bois, F. Y., Parent, E., and Robert, C. (2006), “Bayesian optimal design via interacting particle 
systems,” Journal of the American Statistical Association, 101, 773-785. 

Atkinson, A., Donev, A., and Tobias, R. (2007), Optimum Experimental Design, with SAS, Oxford: Oxford 
University Press. 

Atkinson, A. C., Chaloner, K., Herzberg, A. M., and Juritz, J. (1993), “Experimental designs for properties 
of a compartmental model,” Biometrics, 49, 325-337. 

Bastos, L. S. and O’Hagan, A. (2009), “Diagnostics for Gaussian process emulators,” Technometrics, 51, 
425-438. 

Bliss, C. I. (1935), “The calculation of the dosage-mortality curve,” Annals of Applied Biology, 22, 134-167. 

Chaloner, K. and Larntz, K. (1989), “Optimal Bayesian design applied to logistic regression experiments,” 
Journal of Statistical Planning and Inference, 21, 191-208. 

Chaloner, K. and Verdinelli, I. (1995), “Bayesian experimental design: a review,” Statistical Science, 10, 
273-304. 

Chkrebtii, O. A., Campbell, D. A., Calderhead, B., and Girolami, M. (2015), “Bayesian solution uncertainty 
quantification for differential equations,” arXiv:1306.2365. 

Dean, A., Morris, M., Stufken, J., and Bingham, D. (eds.) (2015), Handbook of Design and Analysis of 
Experiments, Boca Raton: CRC press. 

Fletcher, R. (1987), Practical Methods of Optimization, Chichester: Wiley, 2nd ed. 

Gelman, A., Carlin, J., Stern, H., Dunson, D., Vehtari, A., and Rubin, D. (2014), Bayesian Data Analysis, 
Boca Raton: CRC, 3rd ed. 

Gotwalt, C. M., Jones, B. A., and Steinberg, D. M. (2009), “Fast computation of designs robust to parameter 
uncertainty for nonlinear settings,” Technometrics, 51, 88-95. 

Hamada, M., Martz, H. F., Reese, C. S., and Wilson, A. G. (2001), “Finding near-optimal Bayesian experi¬ 
mental designs via genetic algorithms,” The American Statistician, 55, 175-181. 

Huan, X. and Marzouk, Y. M. (2013), “Simulation-based optimal Bayesian experimental design for nonlinear 
systems,” Journal of Computational Physics, 232, 288-317. 

Jones, M., Goldstein, M., Jonathan, P., and Randell, D. (2016), “Bayes linear analysis for Bayesian optimal 
experimental design,” Journal of Statistical Planning and Inference, 171, 115-129. 

Loeppky, J. L., Sacks, J., and Welch, W. J. (2009), “Choosing the sample size of a computer experiment: a 
practical guide,” Technometrics, 51, 366-376. 

Meyer, R. and Nachtsheim, C. (1995), “The coordinate exchange algorithm for constructing exact optimal 
experimental designs,” Technometrics, 37, 60-69. 

Mira, A., Solgi, R., and Imparato, D. (2013), “Zero variance Markov chain Monte Carlo for Bayesian 
estimators,” Statistics and Computing, 23, 653-662. 

Morris, M. D. and Mitchell, T. J. (1995), “Exploratory designs for computer experiments,” Journal of 
Statistical Planning and Inference, 43, 381-402. 

Muller, P. (1999), “Simulation-based optimal design,” in Bayesian Statistics 6, eds. Bernardo, J. M., Bayarri, 
M. J., Berger, J. O., Dawid, A. P., Heckerman, D., and Smith, A. F. M., Oxford. 


18 



Muller, P. and Parmigiani, G. (1996), “Optimal design via curve fitting of Monte Carlo experiments,” Journal 
of the American Statistical Association , 90, 1322-1330. 

Muller, P., Sanso, B., and De Iorio, M. (2004), “Optimal Bayesian design by inhomogeneous Markov chain 
simulation,” Journal of the American Statistical Association, 99, 788-798. 

Ntzoufras, I., Dellaportas, P., and Forster, J. J. (2003), “Bayesian variable and link determination for 
generalised linear models,” Journal of Statistical Planning and Inference, 111, 165-180. 

O’Hagan, A. and Forster, J. J. (2004), Kendall’s Advanced Theory of Statistics, vol. 2B: Bayesian Inference, 
John Wiley & Sons, 2nd ed. 

Pawitan, Y. (2001), In All Likelihood: Statistical Modelling and Inference using Likelihood, Oxford: Oxford 
University Press. 

Pichney, V., Ginsbourger, D., Richet, Y., and Caplin, G. (2013), “Quantile-based optimization of noisy 
computer expeirments with tunable precision,” Technometrics, 55, 2-13. 

Rasmussen, C. E. and Williams, C. K. I. (2006), Gaussian Processes for Machine Learning, Cambridge, 
MA.: MIT Press. 

Ripley, B. D. (1987), Stochastic Simulation, New York: Wiley. 

Ryan, E. G., Drovandi, C. C., McGree, J. M., and Pettitt, A. N. (2016), “A review of modern computational 
algorithms for Bayesian optimal design,” International Statistical Review, 84, 128-154. 

Ryan, E. G., Drovandi, C. C., Thompson, M. H., and Pettitt, A. N. (2014), “Towards Bayesian experimental 
design for nonlinear models that require a large number of sampling times,” Computational Statistics and 
Data Analysis, 70, 45-60. 

Ryan, K. J. (2003), “Estimating expected information gains for experimental designs with application to the 
random fatigue-limit model,” Journal of Computational and Graphical Statistics, 12, 585-603. 

Santner, T. J., Williams, B. J., and Notz, W. I. (2003), The Design and Analysis of Computer Experiments, 
New York: Springer. 

Severini, T. A. (2000), Likelihood Methods in Statistics, Oxford: Oxford University Press. 

Waite, T. W. and Woods, D. C. (2015), “Designs for generalized linear models with random block effects 
via information matrix approximations,” Biometrika, 102, 677-693. 

Wang, L. and Zhang, L. (2006), “Stochastic optimization using simulated annealing with hypothesis test,” 
Applied Mathematics and Computation, 174, 1329-1342. 

Weaver, B. P., Williams, B. J., Anderson-Cook, C. M., and Higdon, D. M. (2016), “Computational enhance¬ 
ments to Bayesian design of experiments using Gaussian processes,” Bayesian Analysis, 11, 191-213. 

Woods, D. C., Lewis, S. M., Eccleston, J. A., and Russell, K. G. (2006), “Designs for generalized linear 
models with several variables and model uncertainty,” Technometrics, 48, 284-292. 

Wu, C. F. J. and Hamada, M. (2009), Experiments: Planning, Analysis and Optimization, Hoboken, New 
Jersey: Wiley, 2nd ed. 


19 



